Skip to content

Add Phase 3c run manifests and publication identity#855

Draft
anth-volk wants to merge 9 commits intomainfrom
feat/phase-3c-step-manifests
Draft

Add Phase 3c run manifests and publication identity#855
anth-volk wants to merge 9 commits intomainfrom
feat/phase-3c-step-manifests

Conversation

@anth-volk
Copy link
Copy Markdown
Collaborator

@anth-volk anth-volk commented May 1, 2026

Fixes #860

Also fixes #854.

Summary

Implements the Phase 3c execution-ledger boundary by adding typed run and step manifests. Pipeline runs now write run_manifest.json plus per-step JSON manifests under /pipeline/runs/{run_id}/steps/, with declared inputs, parameters, outputs, diagnostics, checksums, reuse decisions, attempts, timings, and failure information.

Extends Phase 3c with publication identity plumbing across GitHub, Modal, and Hugging Face staging. GitHub now resolves a safe publication_id before Modal starts; pipeline runs use that value as the common run namespace when provided; Modal app and volume names can be publication-scoped; run and step manifests record the publication context; HF staging uploads write to staging/{publication_id}/... and include _publication_context.json.

Changes the release workflow coupling so push.yaml explicitly dispatches pipeline.yaml with workflow_dispatch after dataset build and versioning complete. The dispatched pipeline receives the shared publication_id plus the exact post-version-bump source_sha, and pipeline.yaml no longer passively launches from Update package version push events.

The pipeline now uses manifest-backed output validation for reuse decisions, records H5 scope fingerprints inside regional/national H5 step manifests, records partial H5 reuse counts, records data-build checkpoint hit/miss counts, and validates completed step outputs before release promotion.

Verification

  • uv run --no-sync pytest tests/unit/test_publication_context.py tests/unit/utils/test_data_upload.py tests/unit/test_release_manifest.py tests/unit/test_step_manifest.py tests/unit/test_modal_data_build.py tests/unit/test_pipeline.py -q
  • uv run --no-sync pytest tests/unit/test_publication_context.py tests/unit/test_pipeline.py -q
  • uv run --no-sync ruff check .github/scripts/resolve_publication_context.py .github/scripts/spawn_modal_pipeline.py policyengine_us_data/utils/publication_context.py policyengine_us_data/utils/data_upload.py policyengine_us_data/utils/release_manifest.py policyengine_us_data/utils/step_manifest.py policyengine_us_data/storage/upload_completed_datasets.py modal_app/pipeline.py modal_app/data_build.py modal_app/local_area.py modal_app/remote_calibration_runner.py modal_app/h5_test_harness.py tests/unit/test_publication_context.py tests/unit/utils/test_data_upload.py tests/unit/test_release_manifest.py
  • uv run --no-sync ruff check .github/scripts/spawn_modal_pipeline.py modal_app/pipeline.py
  • uv run --no-sync python scripts/run_quality_guards.py
  • uv run --no-sync python -m py_compile .github/scripts/resolve_publication_context.py .github/scripts/spawn_modal_pipeline.py policyengine_us_data/utils/publication_context.py policyengine_us_data/utils/data_upload.py policyengine_us_data/utils/release_manifest.py policyengine_us_data/utils/step_manifest.py policyengine_us_data/storage/upload_completed_datasets.py modal_app/pipeline.py modal_app/data_build.py modal_app/local_area.py modal_app/remote_calibration_runner.py modal_app/h5_test_harness.py
  • GITHUB_RUN_ID=123456789 GITHUB_RUN_ATTEMPT=1 GITHUB_SHA=abcdef1234567890 GITHUB_REPOSITORY=PolicyEngine/policyengine-us-data GITHUB_SERVER_URL=https://github.com GITHUB_WORKFLOW='Run Pipeline' GITHUB_REF=refs/heads/main GITHUB_REF_NAME=main MODAL_ENVIRONMENT=main US_DATA_MODAL_APP_PREFIX=policyengine-us-data-pub .venv/bin/python .github/scripts/resolve_publication_context.py
  • git diff --check

Local full uv run remains blocked on this Intel macOS environment because the locked torch==2.9.1 wheel is unavailable for macosx_x86_64. For targeted tests, I manually installed missing local-only test dependencies into the worktree venv.

@anth-volk anth-volk changed the title Add Phase 3c run-scoped step manifests Add Phase 3c run manifests and publication identity May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add run-scoped publication identity across GitHub, Modal, and Hugging Face Implement Phase 3c run-scoped step manifests

1 participant